N.B. GAMLSS models cannot be fitted using bam or bamV. Big Data methods for GAMLSS models in mgcv are still under development.

Grip strength modelling

We consider a simple data set containing data on grip strength (HG) vs age:

library(gamlss.data)
data(grip)

plot(grip ~ age, data = grip)

We fit a standard Gaussian GAM, and we convert it to mgcViz

library(mgcViz)
fit1 <- gamV(grip ~ s(age), 
             data = grip, 
             aViz = list(nsim = 50))
fit1 <- getViz(fit1, nsim = 50) 

Then we check the residuals mean of a grid of values along age:

tmp <- check(fit1)
## 
## Method: REML   Optimizer: outer newton
## full convergence after 5 iterations.
## Gradient range [-0.001415483,0.001168618]
## (score 12186.55 & scale 37.74976).
## Hessian positive definite, eigenvalue range [1.553286,1882.003].
## Model rank =  10 / 10 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##          k'  edf k-index p-value
## s(age) 9.00 4.35       1    0.46
check1D(fit1, "age") + l_gridCheck1D() 

Now we check the residuals standard deviation along age

check1D(fit1, "age") + l_gridCheck1D( sd ) # <- notice that we are looking at stan dev

The variance is clearly increasing with age. We take this into account using a gaulss model

fit2 <- gamV(list(grip ~ s(age), ~ s(age)),
             data = grip, 
             family = gaulss,
             aViz = list(nsim = 50))

and we repeat the check

check1D(fit2, "age") + l_gridCheck1D( sd )

The residual check now looks ok. These are the estimated mean and log stand. dev. effects

print(plot(fit2), pages = 1)

We looks at a residual QQ-plot

qq(fit2, CI = "normal")

The residuals look skewed to the right. We can also know check whether the residual skewness (asymmetry) changes with age

library(e1071)
check1D(fit2, "age") + l_gridCheck1D( skewness )

It seems that the skewness decreases with age. It would have been difficult to see this directly from the data

plot(grip ~ age, data = grip)

We allow for skewness by adopting a shash model

library(mgcFam)
fit3 <- gamV(list(grip ~ s(age), # location
                  ~ s(age),      # scale (variance)
                  ~ s(age),      # skewness (asymmetry)
                  ~ 1),          # kurtosis (tail behaviour)
             data = grip, 
             family = shash,
             aViz = list(nsim = 50))

check1D(fit3, "age") + l_gridCheck1D( skewness )

The skewness seems to be better modelled now, and we achieve lower AIC:

AIC(fit1, fit2, fit3)
##             df      AIC
## fit1  6.603372 24369.57
## fit2 13.896755 24049.30
## fit3 15.237918 23873.78

The next step is to have a look at residual kurtosis (tail behaviour), this could be done by:

check1D(fit3, "age") + l_gridCheck1D( kurtosis )

but we leave it for next time.

–> –> –> –> –> –>

Quantile modelling: the motorcycle data set

Let us consider the motorcycle data set:

library(MASS)
data(mcycle)
plot(mcycle)

A good Gaussian GAMLSS model for this data is

library(mgcViz)

fitG <- gamV(list(accel ~ s(times, k=20, bs="ad"), 
                        ~ s(times)),
             data=mcycle, 
             family=gaulss())

print(plot(fitG), pages = 1)

We can obtain the quantiles by inverting the estimated Gaussian CDF

qu <- c(0.1, 0.25, 0.5, 0.75, 0.9)

q_hat <- lapply(qu, 
                function(.q)
                  qnorm(.q, mean = fitG$fitted.values[ , 1], 
                            sd = 1 / fitG$fitted.values[ , 2]))

and we can then plot them on top of the data

plot(mcycle, ylim = c(-150, 75))
for(ii in 1:5) lines(mcycle$times, q_hat[[ii]], col = 2)

Now we fit a quantile GAM for quantile 0.9

fit09 <- qgam(accel ~ s(times, k=20, bs="ad"), 
              data = mcycle, 
              qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.9....................done

We now plot the quantile with confidence intervals

pred <- predict(fit09, se = TRUE)

plot(mcycle, ylim = c(-140, 85))
lines(mcycle$times, pred$fit)
lines(mcycle$times, pred$fit + 2 * pred$se, col = 2)
lines(mcycle$times, pred$fit - 2 * pred$se, col = 2)

There are two problems with this fit:

We address these issues by modelling the learning rate as well

fit09_2 <- qgam(list(accel ~ s(times, k=20, bs="ad"),
                           ~ s(times)),
                 data = mcycle, 
                 qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.9..........done

Again predict and plot

pred <- predict(fit09_2, se = TRUE)

plot(mcycle, ylim = c(-140, 90))
lines(mcycle$times, pred$fit)
lines(mcycle$times, pred$fit + 2 * pred$se, col = 2)
lines(mcycle$times, pred$fit - 2 * pred$se, col = 2)

Looks much better! Recall that

class(fit09_2)
## [1] "qgam" "gam"  "glm"  "lm"

hence we can do, for instance

summary(fit09_2)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)    2.182      1.755   1.244    0.214
## 
## Approximate significance of smooth terms:
##            edf Ref.df Chi.sq p-value    
## s(times) 11.05  12.66    393  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.676   Deviance explained = 97.3%
## -REML = 578.21  Scale est. = 1         n = 133

We can use mgcViz plots by converting as usual

fit09_2 <- getViz(fit09_2)

or using shortcut

fit09_2 <- qgamV(accel ~ ... 

We fit multiple quantiles using the mqgam function

fitMQ <- mqgam(list(accel ~ s(times, k=20, bs="ad"),
                            ~ s(times)),
               data = mcycle, 
               qu = c(0.1, 0.25, 0.5, 0.75, 0.9)) 
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5..............done 
## qu = 0.25..........................done 
## qu = 0.75.........done 
## qu = 0.1..............done 
## qu = 0.9..........done

This should be manipulated using the qdo function

qdo(fitMQ, 0.1, summary)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -54.535      2.367  -23.04   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df Chi.sq p-value    
## s(times) 8.323  9.153  932.5  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.649   Deviance explained = 89.9%
## -REML = 593.53  Scale est. = 1         n = 133

To plot one of the fitted quantile models we do

qdo(fitMQ, 0.1, plot)

To plot effect of several quantile at once we use mgcViz

fitMQ_viz <- getViz(fitMQ)

plot(fitMQ_viz)

We can plot the fitted quantile on the data

plot(mcycle, ylim = c(-150, 90))
for(ii in 1:5) lines(mcycle$times, predict(fitMQ_viz[[ii]]), col = 2)

NB output of getViz is simply a list of gam models, we don’t need to use qdo, instead

predict(fitMQ)
# Error in UseMethod("predict") : 
# no applicable method for 'predict' applied to an object of class "mqgam" 

But there is a memory price to pay (small for this small data set)

object.size(fitMQ) / 1e6

object.size(fitMQ_viz) / 1e6

To recap we can do

Rainfall modelling in Parana state of Brasil

Here we are going to model weekly rainfall (mm/week) from over 600 weather station in the Parana state of Brazil. The data comes comes from the INLA R package. The meaning of most variables should be evident.

library(testGam)
data("parana")

plot(parana[parana$TIME==1, ]$LO, parana[parana$TIME==1, ]$LA, xlab = "LO", ylab = "LA") 

We fit a quantile GAM for \(\tau = 0.8\) with an isotropic effect for longitude and latitude, a cyclic effect for the time of the year and smooth effects for distance from the sea and elevation:

library(mgcViz)
fit <- qgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL),
             data = parana,
             qu = 0.8)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.8............done
print(plot(fit), pages = 1)

Hence we can do model checking in mgcViz. However standard tools are inadequate

qq(fit, CI = "normal") 

But we can verify number of observations falling below the fit

mean( (parana$PREC - fit$fitted.values) < 0 )
## [1] 0.803384

and mgcViz provides a specific layer for doing this along one covariate

check1D(fit, x = "EL") + l_gridQCheck1D() 

or two covariates

check2D(fit, x1 = "LO", x2 = "LA") + l_gridQCheck2D() 

We can of course fit this model to several quantiles:

fitM <- mqgamV(PREC ~ s(LO, LA, k = 25) + s(seaDist) + s(TIME, bs = "cc") + s(EL),
               data = parana,
               qu = seq(0.1, 0.9, length.out = 5))
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5................done 
## qu = 0.3...................done 
## qu = 0.7...............done 
## qu = 0.1......................done 
## qu = 0.9............done
plot(fitM, select = 1)

print(plot(fitM, select = 2:4), pages = 1)

Notice that the spatial effect seems much stronger for high quantiles than for the low ones. The same is true for distance from the sea and seasonality (TIME), while the effect of elevation is not significant from qu = 0.9. We can visualize the spatial effect in 3D as follows:

plotRGL(sm(fitM[[5]], 1))

We might also wonder whether the spatial effect changes with time. To verify this here we construct a tensor product smooth between the 2D thin-plate-spline spatial effect and the cyclical effect of time. We simplify the model by removing the effect of seaDist.

fit4 <- qgamV(PREC ~ te(LO, LA, TIME, d = c(2, 1), k = c(20, 10), bs = c("tp", "cc")) + s(EL),
              data = parana,
              qu = 0.9)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.9........done
plotSlice(sm(fit4, 1),
          fix = list("TIME" = round(seq(1, 53, length.out = 6)))) + l_fitRaster()

You can plot any slice in 3D by doing:

plotRGL(sm(fit4, 1), fix = c("TIME" = 11))